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Abstract — This paper studies the low-ranli matrix completion 
problem from an information theoretic perspective. The comple- 
tion problem is rephrased as a communication problem of an 
(uncoded) low-rank matrix source over an erasure channel. The 
paper then uses achievability and converse arguments to present 
order-wise optimal bounds for the completion problem. 

Index Terms — The Netflix prize 

I. Introduction 

The low-rank matrix completion problem has been fairly 
well-studied in literature 13J, El, HI, 0, 0, with both 
algorithms for matrix completion and an analysis of the limits 
within which this is possible El, Q. In f?!, the authors present 
optimality results quantifying the minimum number of entries 
needed to recover a matrix of rank r (using any possible 
algorithm). Also, under certain incoherence assumptions on 
the singular vectors of the matrix, [4| shows that recovery is 
possible by solving a convenient convex program as soon as 
the number of entries is of the order of the bound (within 
polylog factors). The authors of |4| utilize a combination of 
multiple mathematical principles along with an optimization 
approach to determining these limits. In this paper, we study 
the low-rank matrix completion problem using a formulation 
similar to an information-theoretic coding problem and obtain 
achievability and converse bounds on near-perfect low-rank 
matrix completion that are similar to those in |4|. This re- 
formulation of the low-rank matrix completion problem as a 
communication/compression problem enables us to generalize 
the near-perfect mattix completion problem to one which 
incorporates alternate models such as noise and distortion, 
and helps us gain insights into the connections between 
information-theoretic principles and matrix completion prob- 
lems. 

In fT], the authors show that to reconstruct a matrix of rank r 
within an accuracy S, C{r, S)n observations are sufficient. Re- 
sults on low-rank matrix completion with noise are presented 
in |6| (and citations therein). The lossy matrix completion 
problem bears a strong resemblance to a quantization/rate 
distortion problem, while the low-rank matrix completion with 
noise problem has close intuitive connections with a channel 
coding problem. This paper is aimed at being a first step in 
making these connections more concrete. 

This work is supported by NSF grants CCF-0934924, CCF-0916713 and 
CCF-0905200. 



One of the intuitive connections between conventional 
information-theoretic coding theorems and the low-rank matrix 
completion problem is the "erasure-source-channel" perspec- 
tive. The analogy can be drawn between the two as follows: 
Consider a system where the transmit source is fixed to be the 
set of all m X n matrices of rank r or less. When the source 
is transmitted (in an uncoded fashion), the "communication 
channel" causes random erasures in k positions of each 
transmitted matrix. The goal is to recover the original source 
with high probabiUty at the receiver The matrix completion 
problem is then rephrased as: how large can the number of 
random erasures k be so that it is possible to distinguish 
each element of the matrix-source with high probability at the 
receiver? Although we do not explicitly use this reformulation 
of the matrix completion problem in our theorem statement 
or proofs, it is a useful analogy to remember as we proceed 
through the remainder of the paper 

Note that the low-rank matrix completion problem setting 
is in some ways different from conventional source and/or 
channel coding literature. For example, it does not incorpo- 
rate an encoding process. The source (rank r matrices) are 
directly transmitted and the channel is tightly coupled with the 
source. Regardless of differences, we endeavor in this paper 
to highlight the similarities and to point out that many of the 
existing tools in information and coding theory [H may be 
directly applicable to addressing problems in the domain of 
matrix completion. 

In summary, the main contributions of this paper are: 

1) Bounds using tools from information-theory for low- 
rank matrix completion. 

2) For an m X m mattix of rank r, a lower bound of 
il{in) and an achievable near-perfect reconstruction with 
Q{m\ogm) randomly chosen samples (for large m and 
large alphabet size). 

3) Lower bounds for matrix reconstruction with distortion 
constraints using concepts from rate-distortion. 

The rest of this paper is organized as follows: the next 
section formally presents the system model. In Section III we 



study both the achievability and converse bounds for the case 
of near-perfect matrix completion. In Section [rV] we present 
lower-bounds for the case when we desire to learn low-rank 
matrices within a distortion constraint. We conclude the paper 
with Section |Vl 



II. System Model 

First, a note on the notation used in this paper: S denotes 
a set and \S\ denotes the cardinality of the set S. denotes 
a vector of n-entries Yi, . . . , F„, while Yj' for j < k denotes 
the subvector K, , . . . , Yfc. 5 is used to denote both the random 
variable and a particular realization of it. Pr(.) denotes the 
probability of a certain event. 

Let S be the set of all to x m matrices with the following 
structure: 

S = UV (1) 

yS £ S. Here, J7 is an m x r matrix, and is an r x to 
matrix. The entries of U and V are assumed to belong to the 
finite field Z^, and the matrix multiplication is defined over 
integers Z. 

We make two assumptions on S for the sake of simplicity: 
first, that it is of equal dimension m x m. Second, that the 
entries of U and V are assumed to be drawn uniformly and 
independently from Zg. Both of these assumptions can be 
relaxed relatively easily. Making such assumptions helps us 
derive relatively uncomplicated expressions for the relation- 
ships between system parameters that resemble those in ID, 
Q. The expressions would be considerably more involved for 
more general models. 

Note that the set S contains matrices of rank r or less. How- 
ever, as the size of the alphabet (q) increases, the probability 
that {u,v) e (W, V) has a rank less than r diminishes. 

For any S E S, we are given n randomly chosen (without 
replacement) values from S. We use Y" to represent the values 
of the matrix S at those locations. We denote the locations 
ihj)^ 1 < < w that were sampled as the vector Z". 
From y" and Z", we desire to recover S. For a given value 
of n and a recovery function S ~ .g(i^", Z^), we define 



Pe = Pr 



{S ^ S\Y''\ 



As in conventional analysis, we consider the probability of 
error averaged over all S £ S. In the case when we desire 
near-perfect recovery of the matrix 5", we desire that n be 
large enough such that there exists a decoding function g 
with "small" Pg. This is similar to a lossless source-recovery 
problem setting. We refer to this as the near-perfect recovery 
as there is a finite (but arbitrarily small) probability that 
the recovery process will fail. This problem formulation is 



analyzed in further detail in Section III 



Alternatively, we may impose a distortion constraint on the 
recovery process 

E[d(s, J)] < D 

where d{s, s) is a suitable distortion function. Again, we 
desire that I be large enough such that the reconstructed 
S meets this distortion requirement. This bears a strong 
resemblance to a rate distortion problem setting and lower 



bounds for it are studied in greater detail in Section IV 



In this paper, we determine the relationship between to and 
n that is required to recover S within the appropriate constraint 
for a given fixed rank r. We determine this relationship in 



the order sense when all of to, n and alphabet size q are 
sufficiently large. 

III. Near-Perfect Matrix Recovery 

In this setting, we desire that, for any e > 0, there exist an 
n and correspondingly, an I sufficiently large such that, on an 
average across all elements of S, the elements of S can be 
recovered with a probability Pg < e. 

Theorem 3.1: Given an n-length sampled sequence and 
sampled locations Z", a matrix from S can be reconstructed 
with high probability only if 

n = fl{m). 

Moreover, if n ^ <d{m login), a reconstruction algorithm 
exists that will determine S accurately with high probabil- 
ity. Specifically, given a target probability of error e and 
a finite rank r, there exists an to, q large enough and an 
n — Q{m\ogm) such that P^ < e. 

Proof: In the same spirit as a channel-coding theorem, this 
proof incorporates both an achievability and a converse com- 
ponent. We begin with the converse argument: 

A. Converse 

From Fano's inequality E), we have that 

i/(5|y",Z") < Pelog|5| < Pe(2rmlog(?) 
Therefore, we have: 

H{S) = i7(5|Z") 
(fc) 

< /(S';r"|Z") + Pe(2rTOlogg) 

i - ^(^"15-,^") + Pe(2rTOlogg) 

(£) H{Y''\Z") + Pg{2rm\ogq) 

(e) 

< nlog(r(7^) + Pe(2rTOlog(7) 

where (a) follows from the independence between S and 
Z", (b) from Fano's inequality, (c) from the chain rule 
on mutual information and (d) from the fact that is a 
deterministic function of S given Z". Finally, (e) follows from 
the realization that each entry of any matrix S £ S has a 
maximum value of rq^ (from the definition of S in Equation 
[TJ. So we have, 

iJ(r"|Z") <Y,H{Y,\Z,) < nlog{rq^) 

i 

But, we also have that 

H{S) = H{UV) > H{UV\V) = H{U) = mrlogq 
Thus we must have 

TOrlogq < nlog(rq'^) + Pf,{2rmlogq) 

So for Pe arbitrarily small, an n — n{m) is necessary for 
reconstruction. □ 

Note that this can also be seen directly using a fairly 
intuitive and straightforward degrees-of-freedom argument for 
the system. 

Next, we proceed to the achievability argument. 



B. Achievability 

The achievability argument is the more involved component 
of this proof. Define A™ (5) as the set of all e-typical matrices 
S G S generated in accordance with Q. First, we define the 
sets ID: 

A'PiU) = lueU -.1-— logp(u) - log q\ < e 

rm 

ATiV) = {veV:\-—\ogp{v)^\ogq\<e\ 

= {s = uv,ue A'^\U),v & A'^'iV)} (2) 

Note that = |V| = 2'''"i°8 9. Therefore, we have that: 

\AT{S)\ < \S\ < 22™i°g9 

Now, we sample the set A™ (5), dropping 2^'"™'^ of its 
entries at random to generate the set T- Thus, we have 

|7~| < 22''™(i°g9^'') 

Now, given that the "received vector" F" . Z" resulted from 
a matrix S £ S, we "decode" the sparse matrix as follows: 
we determine all S G v4™ that match the values ¥'"■ in the 
positions corresponding to Z". We declare success if a unique 
S is found, and declare an error if: 

1) The event Eq occurs, which is S ^ T, or 

2) The event E'^ occurs - there exists S' S E T that 
agrees with in the positions Z". 

The overall probability of error is given by 



P,^PriEo[j U E's 

\ S'eT,S'^s 

It follows from AEP IS) that: 

Pr(r) > 1 - 7(^) 



(3) 



where ^{S) goes to zero as 5 and m — cx3. 
Therefore, 



Pe < lis) + 



E 

S'er,S'^s 



Pt{E's). 



It is important to note that, for a particular value of Y" = 
yiT-^Z"^ = z", either the event E'^ occurs (with probability 1) 
or it does not occur at all. The key step here is to average this 
over all realizations of and all possible sampling strategies 
Z". 

To determine the remainder of Pe, we need the following 
two lemmas: 

Lemma 3.2: Let A = [oi, 02, . • . a^] be a random vector 
uniformly chosen over Z,, and let C be an r x r random 
matrix with entries from Zg with the ith column denoted as 
Ci. Then, for any (3 > 0, there exists a q sufficiently large 
such that: 

HiCrA\CiA,C2A,...,Cr-iA,C) > (^l-'^^log? 

> log 9 - 13. 



Proof: Note that: 

H{CrA\CiA,...,Cr^lA,C) = 

H{CA\C) - H{C\A, . . . , Cr-iA\C) 

As noted in ifTTIl . lfT2l . the probability that C is not invertible 
(for both integer-valued and finite-field matrices) diminishes 
at least as r/q (Schwartz -Zippel lemma). Thus 



and 



H{CA\C) > ( 1 - - ) rlogg 



H{CiA,C2A.,....,Cr-iA\C) < (r-l)logq 

Thus we have the result. □ 
Note that 

H{CrA\CA, C) > H{CrA\CiA, C2A, Cr-iA, C) 

where C is any subset of Ci, C2, . . . , C,— 1. Therefore, we 
must have: 

H{CrA\CA,C) >logg-/3 

Lemma 3.3: For an arbitrary ^ > 0, there exists an n2 7™2 
such that, for n > n2 and m > TO2, we have: 



Pt{E'JZ'' = 0") < 2 
Proof: Note that: 



Pr(3S" eT-.S'j^ S\Z" = z") = Pr(y"|Z" = z"). (4) 

In words, the probability that two distinct elements of T 
agree in a given set of n randomly chosen places is equal 
to the probability of that particular set of values, across all 
possibilities when sampling matrices in T. 

The second part of the proof is essentially the Shannon- 
McMillan-Breiman Theorem (SMB) for discrete-time discrete- 
valued sources with minor modifications. As the proof of the 
SMB theorem is fairly involved, we refer the reader to the 
sandwich proof by Algoet and Cover |l9|, which is summarized 
in UOJ. 

Next, we quantify i/(y"|Z" = z"). To do so, note that 

i/(y„|r"-i,z") > 

max{H(r„|F"-\ z", V),H{Y^\Y"-\ z", U)} 

which follows from the fact that conditioning cannot in- 
crease entropy. Next, we determine H{Yn\Y"^^ , Z" — z",V) 
noting that an analogous exercise holds for i/(y„|F"^^, Z" = 
z", [/). Given V, F„ is a linear combination of the entries of 
row of U using known coefficients from V chosen through 
Zn = Zn- Let this row be denoted as Ui. If z"~^ causes 
to contain r — 1 or less linear combinations of Ui, then from 
Lemma 13.21 we have that: 



H{Yn\Y"-\Z" =z",V)>\ogq-p. 

Otherwise, we use the trivial lower bound H {Yn\Y"-~^ , Z" 
z",y) > 0. 



A similar inequality holds for i7(y„|r"-\ Z" = z", U) if 
2;" causes to have r—1 or less linear combinations of the 

particular column of ^ in In case we have r or more linear 
combinations, we assign iJ(y„|F"^^, Z" — z",[/) > 0. 

For the remainder of the achievability argument (Equation 
(j7]i), we desire that the number of samples n be such that 

= z") > 2rm{\ogq - [3). (5) 

Note that the upper limit on H{Y"\Z'^ — z") is 2rmlogq, 
and thus (j5]l is "close" to this limit for small (3. This may or 
may not hold, depending on z". We require that n be large 
enough so that a "typical" Z" result in each row and column 
of the sampled matrix have at least r entries. 

Let G„ denote the set of all sampling sequences Z" ~ z" 
that include at least r entries in each row and column. We 
designate a new error event to include as z"s that do not 
incorporate this requirement. We show that n = O(mlogm) 
is sufficient to ensure that Gn occurs with high probability. 
This problem resembles the scenario where we have n balls 
and m bins, each bin with a capacity limited to a total of m 
balls. We place the n balls uniformly randomly in the m bins 
sequentially, eliminating bins that are at capacity. We desire 
that the probability of any bin having r — 1 or less balls be 
small. 

In the analysis that follows, we drop the max capacity of 
m per bin as it can only lead to a larger value for n to satisfy 
the requirement that each bin have at least r balls, and study 
the problem of placing n balls randomly in m bins. Let n — 
am log m for any a > 2. Then we have that the average 
number of balls in each bin is a log m. If Wi is the number 
of balls in Bin i, using a Chernoff bound we have: 



Pr(Wi <r) <€" 



< 



1 



1/2 



Hence, the probability than any row or column of the 
sampled matrix has fewer than r entries is upper bounded 
by: 



2mPT{Wi <r) < 



2m 



which diminishes as m increases. Let 7712 be such that, for 
all TO > TO2, we have 



2mPi{W, <r) <T 



(6) 



for an arbitrary r > 0. As mentioned before, we declare 
an error when the sampled matrix is such that there are fewer 
than r entries in any row or column. Therefore, the overall 
probability of error expression can be upper bounded as: 

Pe < l{S) + Pr(Z" ^ Gn) 

+ Pr(Z" e G„)2"^^'^"'^"='"^~i^^|r| 
From (|5]l and (|6|, when n — am log to we have 

Pe < 7(5) +T + 2-(2™'5-™/3-"™log™T5|^;r) (7) 



Thus, as long as we choose 

^> 2 + 27' 

there exists an TO3 large enough such that, for all m > TO4 we 
have: 

2— (2rm(5 — — am^) ^ ^ 

for some A > 0. Thus for an to large enough, Pe < e for 
any e > 0. This concludes the achievability proof. □ 

Thus, the overall result is established. Note that there is 
a log factor gap between the lower and upper bounds on 
matrix completion. This log-factor ensures enough entries are 
sampled from each row and column of the matrix. If a more 
systematic sampling method Z" was adopted for obtaining 
Y" from the matrix S than just random sampling, then this 
log-factor may not be essential for near-perfect reconstruction. 

IV. Matrix Reconstruction Under Distortion 
Constraints 

Next, we present lower bounds when we do not desire 
perfect reconstruction but allow for a distortion between the 
original matrix source S and the reconstruction S as given 
by (j9|l. We base this lower bound on principles from rate- 
distortion theory. The achievability argument is fairly involved 
and is therefore relegated to a future paper We provide the 
lower bound as it is relatively straightforward to obtain and 
it illustrates the application of concepts from rate distortion 
theory to matrix reconstruction. 

In this section, we present lower bounds under two settings 
- when the alphabet is discrete (under Hamming distortion) 
and when it is continuous (under squared error distortion). 

A. Case 1: Discrete source with Hamming distortion 
Here, we desire to determine a bound on n such that 

1[{S ^ S),^,] < Dm^ 



Intuitively, we desire that the matrices S and S differ in D 
places on an average. To determine the lower bound, we have 
the following inequalities: 

H{S\Z") = i/(5|Z")-iJ(S'|y",Z") 
H{S\Z'')-H{S\S,Z'') < /(5;y"|Z") 

I{S;S\Z") < H{Y"\Z'')- H{Y"\S,Z") 

I{S;S\Z'') < i/(y"|Z") 

where (a) follows from the fact that 5 is a function of 
y", Z", and (6) from the fact that and S must agree on 
the positions given by Z" to minimize distortion. 

Now we have: 

7(5; S\Z") = H{S) ~ H{S\S, Z") 
> H{S)-H{S-S) 

If T ^ S — S, the distortion constraint requires that T be a 
matrix with at most Dm^ non-zero values, with a range of at 



most —rq^ to rq^ . From the maximum entropy theorem, we 
have 

H{T) < Dm^ log(2rg2) 

and so, 

I{S; S) > H{S) - \og{2rq^) 

> 2rm(log q-S)- Dm'^ \og{2rq^) (8) 

Combining (jsjl and reahzing that H(Y") < nlogrq^, we 
have 

nlog(rq^) > 2rm{logq - 6) - Dm^ log{2rq^) (9) 

Remark 4.1: Note that if /? > 1, then the lower bound (|9|, if 
tight, indicates that lossy reconstruction may be possible with 
a constant or polylog number of samples. However, the lower 
bound may not be tight in that regime and an achievability 
argument is needed to indicate if this is possible. 

B. Case 2: Continuous source with the squared norm 

To illustrate the usefulness of the information-theoretic 
formulation, we consider the problem of reconstructing a 
matrix from a continuous alphabet. In this case, the source 
is any continuous valued matrix source of rank r with a finite 
(differential) entropy rate: 

h*{S)^ lim ^ 

m— ^oo Tin 

our distortion constraint is given by 

E^{S ~ S)l^] < Dmf^ (10) 
By the data processing inequaUty, 

< /(5';r"|Z") 

I{S]S\Z'')^h{S)-h{S\S.,Z'') (11) 
>h{S)-h{S-S) (12) 

(13) 

let £' = 5 — 5* denote the error matrix. What we desire is 
to determine the maximum entropy rate of E such that the 
entries of E satisfy ([TO]). Thus, the optimization problem is as 
follows: 

max h{E) 

such that 

Let f{E) denote the 'true' joint distribution of the entries 
of E, and let g{E) be a Gaussian distribution over entries 
of Eij such that Eij are independent with mean zero and 
variance cr^ given by cr^ = D. 



We pick the remainder of E.^j = 0. Given these, we have: 

DifWg) = -h{f) + ElE^ + m^^\og{27:D) 
Note that D{f\\g) > 0, and so 

M/)< '^log(2W) 
Substituting into this expression, we get, for /? < 2, 

h{E) < — log(27rei?) 

and therefore the resulting bound on the rate distortion func- 
tion is: 

I{S; S) > rmh* {S) - log(27re£') 

Remark 4.2: Note that if the distortion constraint D = 0, 
then reconstruction is impossible unless n = m?. It is also 
trivial to see that for /3 > 2, only a few samples are required 
asymptotically for reconstruction. 

V. Conclusion 

In this paper, we consider an information-theoretic formula- 
tion of the low-rank matrix completion problem. By using this 
formulation, we derive lower bounds on matrix reconstruction, 
and an upper bound in the case of near-perfect reconstruction. 

A point to note that this paper does not provide low- 
complexity mechanisms for matrix reconstruction as in [4|, 
0, 13, Q. In spite of this, this connection with information 
theory proves useful in analyzing the limits of matrix recon- 
struction under different models and constraints. 
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